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Abstract 

Phenotypes proximal to gene action generally reflect larger genetic effect sizes than those that are distant. The human 
metabolome, a result of multiple cellular and biological processes, are functional intermediate phenotypes proximal to gene 
action. Here, we present a genome-wide association study of 308 untargeted metabolite levels among African Americans 
from the Atherosclerosis Risk in Communities (ARIC) Study. Nineteen significant common variant-metabolite associations 
were identified, including 13 novel loci (p<1.6x10~^°). These loci were associated with 7-50% of the difference in 
metabolite levels per allele, and the variance explained ranged from 4% to 20%. Fourteen genes were identified within the 
nineteen loci, and four of them contained non-synonymous substitutions in four enzyme-encoding genes {KLKBl, SIAE, 
CPS!, and NATS); the other significant loci consist of eight other enzyme-encoding genes {ACE, GATM, ACY3, ACSM2B, 
THEM4, ADH4, UGTIA, TREH), a transporter gene {SLC6A13) and a polycystin protein gene (PKD2U). In addition, four 
potential disease-associated paths were identified, including two direct longitudinal predictive relationships: NATS with N- 
acetylornithine, N-acetyl-l-methylhistidine and incident chronic kidney disease, and TREH with trehalose and incident 
diabetes. These results highlight the value of using endophenotypes proximal to gene function to discover new insights 
into biology and disease pathology. 
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Introduction 

The power to detect genetic effects for complex traits is 
influenced by, among other things, the study sample size and the 
effect size of a particular locus. Most contemporary genome-wide 
association studies (GWAS) have achieved increased power by 
increasing the size of the discovery sample to tens of thousands of 
individuals [I]. Besides expanding tlie sample size, focusing on 
variants with large effects is an alternative strategy for novel gene 
discovery. Tlie liuman metabolome consists of a collection of small 
molecules resulting from a variety of cellular and biologic processes, 
the activity of which is regulated by coordinated enzyme action [2]. 
In addition, as metabolites reflect multiple metabolic and physio- 
logical activities in the body, they hold promise to discover 
intermediate traits between gene action and disease processes [3] . 

GWASs of known risk factor phenotypes of clinical disease, such 
as cholesterol or urate levels, have shown that genetic association 
with functional intermediate traits, as opposed to the clinical 
endpoint itself, are often more highly powered and may provide 
information into the biological mechanism of disease [4—7]. 
Untargeted metabolomic approaches simultaneously measure 
numerous known and unknown metabolites present in a study 



sample. Recent studies combining genetics and metabolomics 
have identified multiple common variant-metabolite associations 
with large effect sizes in populations of European ancestry, and 
provided new functional insights into common complex disease. 
[8-1 1]. African ancestry-derived populations have higher levels of 
genetic variation and population substructure, and lower levels of 
linkage disequilibrium (LD) compared to European ancestry- 
derived populations, so studies in African-Americans may lead to 
identification of new genes or variants and fine map of existing loci 
[12-14]. To date, no such study has been conducted in African 
Americans, a population that bears a disproportionate burden of 
disease, such as cardiovascular disease, diabetes and chronic 
kidney disease [15-17]. Our goal here is to identify common genetic 
variations influencing the human metabolome in African Americans 
among the Atherosclerosis Risk in Communities (ARJC) Study in 
order to reveal novel pathways underlying disease etiology and 
possible avenues of disease prevention and treatment. 

Results 

A total of 308 known serum metabolites including 83 amino 
acids, 16 carbohydrates, 9 cofactors and vitamins, 7 energies, 136 
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Author Summary 

Most contemporary GWAS studies have achieved in- 
creased power by increasing the size of the discovery 
sample to tens of thousands of individuals. An alternative 
approach for detecting the effects of novel loci is to 
measure phenotypes that more immediately reflect the 
effects of gene function. The metabolome consists of a 
collection of small molecules resulting from a variety of 
cellular and biologic processes, which can be considered 
intermediate phenotypes proximal to gene function. Here, 
we report a genome-wide association study identifying 
nineteen genetic loci influencing untargeted metabolomes 
traits among African Americans in the Atherosclerosis Risk 
in Communities (ARIC) Study. Fourteen genes mapped 
within nineteen loci, including twelve enzyme-encoding 
genes {KLKB1, SIAE, CPS!, NATS. ACE, GATM, ACY3, ACSM2B, 
THEM4, ADH4, UGTIA and TREH), a transporter gene 
{SLC6A13) and a polycystin protein gene {PKD2L1). In 
addition, four potential disease-associated paths were 
identified, including two direct longitudinal predictive 
relationships: NATS with N-acetylornithine, N-acetyl-1- 
methylhistidine and incident chronic kidney disease, and 
TREH with trehalose and incident diabetes. These results 
highlight the value of using phenotypes proximal to gene 
function to promote novel gene discovery. 

lipids, 12 nucleotides, 25 peptides and 20 xenobiotics (Table SI) 
were included and a set of 2,341,704 common autosomal SNPs 
were tested in 1,260 African Americans (demographics in Table 
S2) for each metabolite levels. Nineteen significant (p-val- 
ue<1.6xl0~ after correction for multiple testing) common 
variant-metabolite associations were identified (locus association 
summaries are presented in Table 1, regional association plots 
and quantHe-quantile plots are presented in Figures SI and S2, 
respectively), including 1 3 novel loci which have not been reported 
in previous metabolomics studies. Depending on the particular 
metabolite, these loci were associated with 7-50% of the dilference 
in metabolite levels per allele (average at 25%), and the variance 
explained ranged from 4% to 20%. 

Fourteen genes were mapped within the nineteen significant 
genetic loci; eight of them encode enzymes that catalyze the 
reaction of the corresponding metabolite as a substrate or product 
(gene names shown in red in Figure 1). Four of the associated loci 
contained non-synonymous substitutions in four enzyme-encoding 
genes {KLKBl, SIAE, CPSl, and mT8}. The other significant loci 
consist of eight other enzyme-encoding genes {ACE, GATM, ACTS, 
ACSM2B, THEM4, ADH4, UGTIA, and TREH), a transporter 
gene [SLC6A13) and a polycystin protein gene {PKD2L1). Two 
protease-encoding genes, ACE and KLKBl, showed pleiotropic 
effects on multiple ohgopeptide metabolites, and the UDP- 
glucuronosyltransferases gene, UGTIA, contributed to the levels 
of several bile pigments (Figure 1). 

Nineteen significant common variant-metabolite associations 
were compared with previously published SNP-metabolite associ- 
ations in Caucasians [10]. Eleven out of nineteen metabolites were 
shared between the pubhshed study and the data presented here, 
and six of them showed the same significant SNP-metabolite 
associations in both ethnicities (Table 2). A CP57-glycine 
association was reported in the Caucasion metabolomic GWAS, 
but the sentinel SNP was different (r^<0.5) from that reported 
here (Table 2). A C'/^Si-glycine association was also reported in a 
recent genetic study for glycine metabolism among Caucasians 
[18]. The other four shared metabolites had different signals in 
African-Americans when compared to Caucasians (Table S3). 



We identified a missense mutation in NATS (rs 13538) that was 
significantiy associated with N-acetylornithine levels (p = 4.0 x 
10 A recent biochemical study has shown that NATS 
catalyzed the N-acetylation of cysteine conjugates [19]. We next 
asked whether the presumed specificity oi NATS' s function could 
be used to identify the identity of any unknown metaboKtes by 
analyzing its effect on 294 unknown metabolites. Two metabolites, 
X- 11333 and X- 11787 reached our a prion defined level of 
significance (p = 1.0x10 and p = 2.5x10 "^^ , respectively). By 
targeted mass spectroscopy, X- 11333 was determined to be N- 
acetyl- 1 -methylhistidine (Figure S3), a type of N-acetyl amino 
acid; and X- 11787 was an isoform of either hydroxy leucine or 
isoleucine, as reported previously [20]. 

Among nineteen metabolites that reached genome-wide signif- 
icance, we identified four potential disease-associated paths among 
African Americans for cardiovascular disease, chronic kidney 
disease (CKD) and diabetes, including two direct longitudinal 
associations (Figure 2, detailed estimates in Table S4). As 
described above, a missense mutation 'mNATS (rs 13538), a known 
susceptibility locus for chronic kidney disease [21], was signifi- 
cantly associated with N-acetylornithine and N-acetyl- 1 -methyl- 
histidine levels. We identified a pronounced relationship of both 
N-acetylornithine and N-acetyl- 1 -methylhistidine levels with 
kidney function, whereby higher levels of of N-acetylornithine 
and N-acetyl- 1 -methylhistidine were related to lower eOFR 
(p = 9.0xl0 ''^ and 1.6x10 ^'; respectively) and higher risk of 
incident CKD after 19 average years of follow-up among 1,921 
African Americans (demographics in Table S5, HR=1.64, 
p = 0.003 and HR=1.34, p = 0.03, respectively). However, die 
longitudinal associations with the metabolites were attenuated and 
no longer significant after further adjusting for eOFR (data not 
shown). Finally, trehalose levels were significantly associated with 
TREH gene variation. Trehalose can be cleaved to two molecules 
of glucose. In this study, trehalose levels were significantiy 
associated with glucose levels (p = 2.9xl0 "), and showed a 
1.34 fold increased risk of incident diabetes after an average 7 
years of follow-up (p = 2.0x 10""') in a sample of 1,430 ARIC 
African Americans (demographics in Table S5). With further 
adjustment of glucose levels, trehalose levels persisted to show an 
apparent association with incident diabetes, although the effect 
size was lessened (HR= 1.16, p = 0.02). 

Discussion 

By combining high-throughput metabolomic and genomic 
technologies, we identified nineteen common variant-metabolite 
associations among African Americans with p-values ranging from 
6.0xl0~" to 4.0x10"^^. We inferred the structure of an unknown 
metabolite to be N-acetyl- 1 -methylhistidine using knowledge of the 
associated gene's function and targeted mass spectroscopy. We 
further established potential novel disease-associated pathways for 
cardiovascular disease risk factors, CKD and diabetes. The results 
offer new evidence about the genetic impact on metabohtes and 
disease among African Americans, which advance our understand- 
ing of disease causation and progression. 

Most loci identified by GWA studies of complex disease traits 
contribute relatively small effects and the variance explained 
remains modest [14,22,23]. Thus, contemporary GWAS are 
shifting focus to phenotypes that more immediately reflect the 
effects of gene action. For example, although the effect sizes of 
genetic loci related to coronary heart disease (CHD) are rela- 
tively smaU (OR from 1.08 to 1.47) [24-26], loci related to 
plasma triglyceride and cholesterol levels explained a meaningful 
proportion of the variance (9-13%) [4]. The human metabolome. 
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Cofactors and vitamins 



Peptide 



bilirubin 
(E,E) 
UGT1A 
,(rs887829, T, 
0.32, 7%) 



3-hydroxy 
decanoate 
THEM4 
^(rsl 0788817,6/ 
0.11, 5%)^ 



bilirubin 
(Z.Z) 

UGT1A 
(rs887829, 
T, 0.34, 5%> 



biliverdin 

UGT1A 
(rs887829, 
J, 0.32, 8%) 



'palmitoleate 
(16:1n7) 
PKD2L1 
(rs603424, G, 
0.11, 5%) 

deoxy 
carnitine 
SLC6A13 
(rs555044, A,^ 
0.07, 4%) , 



tH]HWESA! 
LLR[OH] 
ACE 
(rs4343, A, 
0.35, 7%; 

acetyl 
carnitine 
SIAE 
(rsl 22821 07,^ 
,C, 0.29, 6%) 

hexade 
canedioate 

ADH4 
(rsl 702861 5,^ 
lA, 0.19, 7%\ 



/ aspartyl \ 
/ phenylalanineX 
/ ACE \ 
\\ (rs4343, G, A 
\\ 0.22,8%) // 



threonyl 
'phenylalanine\ 
ACE 
(rs4363, G, 
0.21, 6%) 



glycine 
CPS1 
(rs7422339, 
,A, 0.10, 5%), 



N -acetyl 
ornithine 
NATS 
(rs13538, A, 
.0.30, 20%) 



HXGXA 
KLKB1 
(rs3733402. A, ^ 
0.44, 9%) 



LEU-PHE 
KLKB1 
(rs3733402, 
A, 0.28, 8%) 



creatine 

GATM 
(rs2433610, 
T, 0.09, 5%) . 



Carbohydrate 



trehalose 
TREH 
(rs507080. A, 
0.51, 10%) 



N-acetyl 
^phenylalanine^ 
ACY3 
(rsl 2288023, 
VC 0.27, 5%)y 

phenyl 
acetate 
ACSM2B 
V(rs7499271, A, 
0.25, 5%) 



Lipid 



Amino acid 



Figure 1. Genome-wide significant loci and human metabolic traits among African Americans in ARIC. Each hexagon shows the 
significant genetic locus (p<1.6x10"^'') and the corresponding metabolite. The gene name listed in a hexagon is mapped by the sentinel SNP, and 
the closest gene is picked if the sentinel SNP was not located in a gene but is in linkage disequilibrium (r'^aO.S) with other SNPs in a nearby gene. 
IVletabolites are grouped by super pathway, indicated in different colors. A red border line indicates that this gene-metabolite pair has been 
previously reported, and a gene name in red indicates the gene encodes an enzyme that catalyzes the reaction of the corresponding metabolite as a 
substrate or product. Rs number, risk allele, effect size and variance explained for the sentinel SNP are listed in parenthesis. 
doi:1 0.1 371 /journal.pgen.1 00421 2.g001 



the ultimate downstream product of gene and environment 
interaction, holds the promise to identify genes that directly reflect 
gene action with large effects sizes [8,10,27]. Our results show 



relatively large effect sizes of nineteen identified genetic loci related 
to human metabolome among African Americans (average at 25 % 
shift per allele copy). In addition, the majority of identified loci 



Table 2. A comparison of significant common variant-metabolite association among ARIC, KORA and TwinsUK studies. 





Metabolites 


ARIC 






KORA 






TwinsUK 






Top SNP 


P 




SNP 


P 




SNP 


P 




aspartylphenylalanine 


rs4343 ACE (synonymous) 


9x10" 


25 


rs4343 


2x10" 


-10 


rs4343 


2x10" 


-10 


N-acetylornithine 


rsl 3538 NATS (missense) 


4x10" 


66 


rs6745480 (r^ = 1 ) 


3x10" 


-123 


rsl 04961 91 (r^ = 0.95) 


2x10" 


-65 


palmitoleate (16:1n7} 


rs603424 PKD2U (intron) 


1 xlO" 


11 


rs603424 


1 xlO" 


-7 








bilirubin (E,E) 


rs887829 UGTIA (intron) 


1 xlO" 


17 


rs887829 


3x10" 


-24 


rs887829 


5x10" 


-5 


bilirubin (Z,Z) 


rs887829 UGTIA (intron) 


6x10" 


13 


rs887829 


1 xlO" 


-46 


rs887829 


4x10" 


-8 


biliverdin 


rs887829 UGTIA (intron) 


8x10" 


23 


rs887829 


5x10" 


-47 








glycine 


rs7422339 CPSl (missense) 


4x10" 


12 


rs2371015 {r^<0.5) 


3x10" 


-9 


rs4673553 (r^<0.5) 


2x10" 


-23 



doi:1 0.1 371 /journal.pgen.1 00421 2.t002 
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Gene 



Metabolite 



Risk factor 



Clinical endpoint 




Cardiovascular 
disease (e.g. 
Hypertension, 
Stroke, etal.) 



Incident Coronary 
Heart Disease 



Incident Chronic 
Kidney Disease 



Incident Type 2 
Diabetes 



Figure 2. Pathways among gene, metabolite, risk factor and disease identified among ARIC African Americans. Solid arrows between 
genes and metabolites indicate genome-wide significant effects (p<1.6x10"'''). Arrows between metabolites and risk factors indicate significant 
linear associations after adjusting for age and gender (p<0.05). Arrows between metabolites and clinical endpoint indicate significant associations 
after adjusting for age, gender and other risk factors using Cox proportional hazards modeling (p<0.05). The dotted arrows between risk factors and 
clinical endpoints indicate well-established relationships. DBP indicates diastolic blood pressure and eGFR, estimated glomerular filtration rate. 
doi:1 0.1 371 /journal.pgen.1 00421 2.g002 



(15/19) are located in or near genes, and these loci explained up to 
20% of the variance of each trait. 

Twelve out of fourteen genes that were significantly associated 
with metabolite levels were enzyme-encoding genes, including four 
genes involved in disease-associated processes. These data under- 
score the important role of enzyme activity and regulation in 
controlling metabolite levels. As metabolite levels are closely 
related to disease process, to understand whether the underlying 
mutations detected here lead to gain-of-function or loss-of-function 
for these enzyme-encoding genes offers new opportunities for 



disease treatment and prevention (e.g. design an antagonist/ 
agonist of the gene as a drug candidate). The majority of the gene- 
metabolite associations are consistent with the gene's known 
function, but the direction of effect of the coded allele does not 
provide direct evidence as to whether or not the variant represents 
gain of function or loss of function. Future investigation of the 
functional impact of the underlying causal variants is critical and is 
an area of intense research. 

NATS is expressed mainly in the kidney and liver [28], but its 
function is not fuUy understood. Several previous, seemingly 
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unrelated, observations have found that mutations in N-acetyl- 
transferase 8 (MAT8). are contributed to N-acetylornithine levels, 
creatinine levels, kidney function and CKD [10,21,29,30]. Our 
results show that an amino acid substitution in .A^ T8 is related to 
N-acetylornithine, N-acetyl- 1 -methylhistidine and eGFR, which 
in-tum influence risk to incident CKD. These findings provide 
evidences that N-acetylation plays a role in the development of 
CKD [10]. 

Trehalose is a food ingredient with the abUit)' to pre\ ent protein 
denaturation [31]. Because of its ability to inhibit lipid and protein 
misfolding, trehalose has become a potential therapeutic in 
neurodegenerative studies [32,33]. Animal safety studies conclud- 
ed that trehalose is safe for use as an ingredient in consumer 
products [34], and it is now widely used in food and cosmetics. 
Here, we report that trehalose levels are regulated by TREH, 
which encodes the trehalase enzyme which hydrolyzes trehalose to 
two glucose molecules. In addition, we show that trehalose is 
associated with glucose levels and the onset of incident diabetes. 

Environment factors, in addition to and interacting with genetic 
factors, (e.g. dietary intake) explain part of the variability of human 
metabolome. Follow-up investigations of the interactions between 
the genes identified here and possible environment factors are 
likely to provide new insight into the understanding of disease 
etiology and its metabolism. For example, alcohol dehydrogenase 
4 {ADH4) contributes to esophageal squamous-ceU carcinoma 
(ESCC) through an interaction with alcohol consumption [35]. 
Here, we reported that ADH4 is associated with hexadecanedioate 
levels, a metabolite with an antitumor activity [36]. Moreover, 
studies have shown that coffee consumption is associated with 
lower bilirubin levels [37] and UGTIA is contributed to ijilirubin 
levels as well [10]. Our data show that mutations in UGTIA are 
associated with the levels of several bile pigments. Thus, future 
investigations of genes related to metabolite levels with environ- 
ment interaction are of interest. 

Untargeted metabolomics approaches measure numerous 
known and unknown metabolites presented in a sample simulta- 
neously. Since the chemical identities for unknown metabolites 
have not been elucidated, previous GWAS on metabolomic traits 
largely ignored unknown metabolites for the analysis. In our study, 
we show an example of unknown metabolite identification (i.e. X- 
11333) by combining GWAS results (i.e. NATS) with existing 
knowledge about the function of the gene product (i.e. N- 
acetylation). A recent study has used GWAS results and Gaussian 
graphical modeling to predict unknown metabolite identities [38]. 
These two examples demonstrate the feasibility for unknown 
compounds structure identification by combing genetic and 
metabolomics information. 

Limitation of this study warrants consideration. To our 
knowledge, the ARIC study is the only cohort with serum 
metabolome measurements in African-Americans, so it is unlikely 
to find an independent sample for replication. In our study, the 
SNP-metabolite associations identified were compared with the 
results from a published study in Caucasians [10] as a surrogate 
replication. Six distinct SNP-metabolite associations were repli- 
cated out of eleven shared metabolites, indicating homogeneous 
genetic effects on several metabolites regardless of ethnicities. 
Differences in the site frerjuency spectrum between African- 
Americans and Caucasians and lower LD in African-Americans 
may explain the lack of significant association at the other loci. As 
a consequence of lack of replication, the proportion of variance 
explained by the SNPs was reported from the discovery sam- 
ple, which may be an over-estimate. Future studies are needed 
to replicate our findings in independent samples of African- 
Americans. Despite limitations, the data presented here have 



important strength. Previously published GWAS on human meta- 
bolites estimate only cross-sectional relationships between metab- 
olites and clinical endpoints. In contrast, the data presented here 

originate from a large, well-defined, longitudinal cohort study, 
allowing establishments of longitudinal predictive relationships. 

In summary, we report here the first genome-wide association 
study of untargeted metabolome in African-Americans. The 
genetic variant-metabolite associations along with the disease path 
reported here will continue to be improved with further use of 
contemporary omics technologies. Our study highlights the value 
of utilizing omics studies in deeply phenotyped individuals to 
provide new insights into gene function, disease etiology and 
epidemiology. 

Methods 

Study Population 

The Atherosclerosis Risk in Communities (ARIC) study is a 
longitudinal cohort study designed to ascertain the etiology and 
predictors of cardiovascular disease (CVD). The ARIC study 
enrolled 15,792 middle-aged adults from four U.S. communities 
(Forsyth County, NC; Jackson, MS; suburbs of Minneapolis, MN; 
and Washington County, MD) between 1987-89 and followed by 
four completed visits with each approximately three years apart, in 
1987-89, 1990-92, 1993-95, and 1996-98. In general, each visit 
included inter\dews and a physical examination. A detailed 
description of the ARIC study design and methods was published 
elsewhere [39]. Metabolomic profiles were measured in baseline 
serum from 1,977 African-Americans selected from the Jackson, 
MS field center. Participants were excluded if they did not give 
consent for use of DNA information. 

Assessment of Metabolomic Profiles 

Metabolite profiling was completed in June 2010 using fasting 
serum samples which had been stored at —80° since collection at 
the baseline examination in 1987-1989. In total, detection and 
quantification of 602 metabolites was completed by Metabolon 
Inc. (Durham, USA) using an untargeted, gas chromatography- 
mass spectrometry and liquid chromatography-mass spectrometry 
(GC-MS and LC-MS)-based metabolomic quantification protocol 
[40,41]. Prior to the analyses presented here, a rigorous assessment 
of the metabolomic data was done. Metabolites were excluded if 
1) more than 50% of the- sampk-s had values below the detection 
limit; or 2) they had unknown chemical structures, except for X- 
11333 and X- 11787 which were foUowed-up as part of more 
detailed .NATS investigations. After this assessment, a total of 308 
named metabolites were included in the present study. Structural 
identifications for X-11333 and X-11787 were proposed using a 
mass spec-based structural approach, including targeted accurate 
mass and MS° fragmentation with accurate mass [41]. 

Genotyping and Imputation 

In the present study, common (minor allele frequency, 
MAFS5%) autosomal single-nucleotide polymorphisms (SNPs) 
were genotyped on the Aflymetrix 6.0 chip and were imputed to 
2,341,704 SNPs based on a panel of cosmopolitan reference 
haplotypes from HapMap CEU and YRI. MACH vl.O was used 
to do imputation and allele dosage information was summarized in 
the imputation results. SNPs were excluded before imputation if 
they had no chromosomal location, were monomorphic, had a call 
rate <95%, or had a Hardy-Weinberg equilibrium p-value< 10~^. 
For each SNP, the ratio of the observed versus expected variance 
of the dosage served as a measure of imputation quality. 
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Genome-Wide Association Analyses 

A total of 308 metabolites were included in this analysis. 
Metabolite levels below the detectable limit of the assay were 
imputed with the lowest detected value for that metabolite in all 
samples, and aU metabolites values were natural log-transformed 
prior to the analyses. Linear regressions and an additive genetic 
model were applied to each metabolite, adjusting for age, sex and 
the first 10 principal components. The significant threshold was 
defined as a p-value< 1.6 x 10"'" (5.0 x 10"V308) based on 
Bonferroni correction. SNPs with MAF<5% were excluded. 
Quantile-quantUe (QQ) plots were generated for each analysis to 
illustrate the distribution of the observed and expected p-values for 
all eligible SNPs. Regional plots showing LD and the location of 
nearby genes (if any) were generated for the top ranking SNPs for 
each metabolite. If more than one significant SNP clustered at a 
locus, the SNP with the smallest p-value was reported as the 
sentinel marker. All analyses were performed using ProbABEL 
and R (www.r-project.org). The identified sentinel SNPs were 
further compared with the metabolite-SNP association from the 
KORA and TwinsUK studies [10] using their public GWAS 
server (http:/ / metabolomics. helmholtz-muenchen.de/gwa/index. 
html) and other published GWA studies through NHGRI GWAS 
Catalog (http:/ /www.genome.gov/gwastudies/). 

Disease Association Analyses 

Analyses included all African-American samples with metabo- 
lomic data were conducted to estimate the association between 
genome-wide significant metabolite levels and relevant clinical risk 
factors and endpoints, including incident chronic kidney disease 
and incident type 2 diabetes. Nine associations, including six cross- 
sectional associations with clinical risk factors and three longitu- 
dinal associations with clinical endpoints, were tested. In each 
analysis, metabolite levels were natural log-transformed. The 
cross-sectional associations were assessed using linear regression 
with adjustment for age and gender. Longitudinal associations 
with disease endpoints were estimated using Cox proportional 
hazards models adjusting for age, gender, systolic blood pressure 
(SBP), antihypertensive medication use, diabetes, high-density 
lipoprotein, low-density lipoprotein, current smoking and preva- 
lent CUD for incident the CKD analysis; and age, gender, SBP, 
antihypertensive medication use, body mass index, total choles- 
terol for the incident type 2 diabetes analysis. The proportional 
hazards assumption was examined and not rejected using the 
methods developed by Grambsch and Therneau [42]. Covariates 
were measured at baseline (1987-1989) and The Chronic Kidney 
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